cinuxfandomcom-20200214-history
NASM X86 Architecture
x86 Architecture The x86 architecture has 8 General-Purpose Registers (GPR), 6 Segment Registers, 1 Flags Register and an Instruction Pointer. '''General-Purpose Registers (GPR)''' The 8 GPRs are: # Accumulator register (AX). Used in arithmetic operations. # Counter register (CX). Used in shift/rotate instructions and loops. # Data register (DX). Used in arithmetic operations and I/O operations. # Base register (BX). Used as a pointer to data (located in segment register DS, when in segmented mode). # Stack Pointer register (SP). Pointer to the top of the stack. # Stack Base Pointer register (BP). Used to point to the base of the stack. # Source register (SI). Used as a pointer to a source in stream operations. # Destination register (DI). Used as a pointer to a destination in stream operations. The order in which they are listed here is for a reason: it is the same order that is used in a push-to-stack operation, which will be covered later. All registers can be accessed in 16-bit and 32-bit modes. In 16-bit mode, the register is identified by its two-letter abbreviation from the list above. In 32-bit mode, this two-letter abbreviation is prefixed with an 'E'. For example, 'EAX' is the accumulator register as a 32-bit value. It is also possible to address the first four registers (AX, CX, DX and BX) in 16-bit mode as two 8-bit halves. The least significant byte (LSB), or low half, is identified by replacing the 'X' with an 'L'. The most significant byte (MSB), or high half, uses an 'H' instead. For example, CL is the LSB of the counter register, whereas CH is its MSB. In total, this gives us four ways to access the accumulator, counter, data and base registers: 32-bit, 16-bit, 8-bit LSB, and 8-bit MSB. The other four are accessed in only two ways: 32-bit and 16-bit. The following table summarises this: '''Segment'''''' Registers''' The 6 Segment Registers are: * Stack Segment (SS). Pointer to the stack. * Code Segment (CS). Pointer to the code. * Data Segment (DS). Pointer to the data. * Extra Segment (ES). Pointer to extra data ('E' stands for 'Extra'). * F Segment (FS). Pointer to more extra data ('F' comes after 'E'). * G Segment (GS). Pointer to still more extra data ('G' comes after 'F'). Most applications on most modern operating systems (like Linux or Microsoft Windows) use a memory model that points nearly all segment registers to the same place (and uses paging instead), effectively disabling their use. Typically the use of FS or GS is an exception to this rule, instead being used to point at thread-specific data. '''EFLAGS Register''' The EFLAGS is a 32-bit register used as a vector to store and control the results of operations and the state of the processor. The names of these bits are: The bits named 0 and 1 are reserved bits and shouldn't be modified. '''Instruction Pointer''' The EIP register contains the address of the '''next''' instruction to be executed if no branching is done. EIP can only be read through the stack after a call instruction. '''Memory''' The x86 architecture is [http://en.wikipedia.org/wiki/Little_endian little-endian], meaning that multi-byte values are written least significant byte first. (This refers only to the ordering of the bytes, not to the bits.) So the 32 bit value B3B2B1B016 on an x86 would be represented in memory as: '''Little endian representation''' For example, the 32 bits double word 0x1BA583D4 (the '''0x''' denotes hexadecimal) would be written in memory as: '''Little endian example''' This will be seen as 0xD4 0x83 0xA5 0x1B when doing a memory dump. '''Two's complement representation''' Two's complement is the standard way of representing negative integers in binary. A number's sign is changed by inverting all of the bits and adding one. 0001 represent decimal 1 1111 represent decimal -1 '''Addressing'''''' modes''' Addressing modes: indicates the manner in which the operand is accessed ;'''Register Addressing''' :(operand address R is in the address field) '''mov''' ax, bx ''; moves contents of register bx into ax'' ;'''Immediate''' :(actual value is in the field) '''mov''' ax, 1 ''; moves value of 1 into register ax'' or '''mov''' ax, 0x010C ''; moves value of 0x010C into register ax'' ;'''Direct memory addressing''' :(operand address is in the address field) '''mov''' ax, '''['''102h''']''' ''; Actual address is DS:0 + 102h'' ;'''Direct offset addressing''' :(uses arithmetics to modify address) byte_tbl '''db''' 12,15,16,22,..... ''; Table of bytes'' '''mov''' al,'''['''byte_tbl+2''']''' '''mov''' al,byte_tbl'''['''2''']''' ''; same as the former'' ;'''Register Indirect''' :(field points to a register that contains the operand address) '''mov''' ax,'''['''di''']''' :The registers used for indirect addressing are BX, BP, SI, DI ;'''Base-index''' '''mov''' ax,'''['''bx + di''']''' :For example, if we are talking about an array, BX contains the address of the beginning of the array, and DI contains the index into the array. ;'''Base-index with displacement''' '''mov''' ax,'''['''bx + di + 10''']''' Stack The stack is a Last In First Out (LIFO) data structure; data is pushed onto it and popped off of it in the reverse order. '''mov''' ax, 006Ah '''mov''' bx, F79Ah '''mov''' cx, 1124h '''push''' ax You push the value in AX onto the top of the stack, which now holds the value $006A. '''push''' bx You do the same thing to the value in BX; the stack now has $006A and $F79A. '''push''' cx Now the stack has $006A, $F79A, and $1124. '''call''' do_stuff Do some stuff. The function is not forced to save the registers it uses, hence us saving them. '''pop''' cx Pop the last element pushed onto the stack into CX, $1124; the stack now has $006A and $F79A. '''pop''' bx Pop the last element pushed onto the stack into BX, $F79A; the stack now has just $006A. '''pop''' ax Pop the last element pushed onto the stack into AX, $006A; the stack is empty. The Stack is usually used to pass arguments to functions or procedures and also to keep track of control flow when the call instruction is used. The other common use of the Stack is temporarily saving registers. CPU Operation Modes '''Real Mode''' Real Mode is a holdover from the original Intel 8086. You generally won't need to know anything about it (unless you are programming for a DOS-based system or, most likely, writing a boot loader that is directly called by the BIOS). The Intel 8086 accessed memory using 20-bit addresses. But, as the processor itself was 16-bit, Intel invented an addressing scheme that provided a way of mapping a 20-bit addressing space into 16-bit words. Today's x86 processors start in the so-called Real Mode, which is an operating mode that mimics the behavior of the 8086, with some very tiny differences, for backwards compatibility. In Real Mode, a segment and an offset register are used together to yield a final memory address. The value in the segment register is multiplied by 16 (shifted 4 bits to the left) and the offset is added to the result. This provides a usable address space of 1 MB. However, a quirk in the addressing scheme allows access past the 1 MB limit if a segment address of 0xFFFF (the highest possible) is used; on the 8086 and 8088, all accesses to this area wrapped around to the low end of memory, but on the 80286 and later, up to 65520 bytes past the 1 MB mark can be addressed this way if the A20 address line is enabled. ''See: ''[http://en.wikibooks.org/wiki/X86_Assembly/16_32_and_64_Bits#The_A20_Gate_Saga ''The A20 Gate Saga'']. One benefit shared by Real Mode segmentation and by [http://en.wikibooks.org/wiki/X86_Assembly/X86_Architecture#Multi-Segmented_Memory_Model Protected Mode Multi-Segment Memory Model] is that all addresses must be given relative to another address (this is, the segment base address). A program can have its own address space and completely ignore the segment registers, and thus no pointers have to be relocated to run the program. Programs can perform ''near'' calls and jumps within the same segment, and data is always relative to segment base addresses (which in the Real Mode addressing scheme are computed from the values loaded in the Segment Registers). This is what the DOS *.COM format does; the contents of the file are loaded into memory and blindly run. However, due to the fact that Real Mode segments are always 64 KB long, COM files could not be larger than that (in fact, they had to fit into 65280 bytes, since DOS used the first 256 of a segment for housekeeping data); for many years this wasn't a problem. '''Protected Mode''' '''Flat Memory Model''' If programming in a modern operating system (such as Linux, Windows), you are basically programming in flat 32-bit mode. Any register can be used in addressing, and it is generally more efficient to use a full 32-bit register instead of a 16-bit register part. Additionally, segment registers are generally unused in flat mode, and it is generally a bad idea to touch them. '''Multi-Segmented Memory Model'''